Decomposition and model selection for large contingency tables.

نویسندگان

  • Corinne Dahinden
  • Markus Kalisch
  • Peter Bühlmann
چکیده

Large contingency tables summarizing categorical variables arise in many areas. One example is in biology, where large numbers of biomarkers are cross-tabulated according to their discrete expression level. Interactions of the variables are of great interest and are generally studied with log-linear models. The structure of a log-linear model can be visually represented by a graph from which the conditional independence structure can then be easily read off. However, since the number of parameters in a saturated model grows exponentially in the number of variables, this generally comes with a heavy computational burden. Even if we restrict ourselves to models of lower-order interactions or other sparse structures, we are faced with the problem of a large number of cells which play the role of sample size. This is in sharp contrast to high-dimensional regression or classification procedures because, in addition to a high-dimensional parameter, we also have to deal with the analogue of a huge sample size. Furthermore, high-dimensional tables naturally feature a large number of sampling zeros which often leads to the nonexistence of the maximum likelihood estimate. We therefore present a decomposition approach, where we first divide the problem into several lower-dimensional problems and then combine these to form a global solution. Our methodology is computationally feasible for log-linear interaction models with many categorical variables each or some of them having many levels. We demonstrate the proposed method on simulated data and apply it to a bio-medical problem in cancer research.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Plain Answers to Several Questions about Association/Independence Structure in Complete/Incomplete Contingency Tables

In this paper, we develop some results based on Relational model (Klimova, et al. 2012) which permits a decomposition of logarithm of expected cell frequencies under a log-linear type model. These results imply plain answers to several questions in the context of analyzing of contingency tables. Moreover, determination of design matrix and hypothesis-induced matrix of the model will be discusse...

متن کامل

Extended Diagonal Exponent Symmetry Model and Its Orthogonal Decomposition in Square Contingency Tables with Ordered Categories

For square contingency tables with ordered categories, this article proposes new models, which are the extension of Tomizawa’s [1] diagonal exponent symmetry model. Also it gives the decomposition of proposed model, and shows the orthogonality of the test statistics for decomposed models. Examples are given and the simulation studies based on the bivariate normal distribution are also given.

متن کامل

A generalized asymmetry model for square contingency tables with ordered categories

For square contingency tables with ordered categories, the present paper proposes an asymmetry model with m-additional parameters, which indicates (1) the generalized marginal homogeneity and (2) the structure of quasi-symmetry for cumulative probabilities. The proposed model includes a modified palindromic symmetry model by Iki, Oda and Tomizawa [7]. Also the present paper gives the decomposit...

متن کامل

Bayesian Selection of Log-Linear Models

A general methodology is presented for nding suitable Poisson log-linear models with applications to multiway contingency tables. Mixtures of multivariate normal distributions are used to model prior opinion when a subset of the regression vector is believed to be nonzero. This prior distribution is studied for two and three-way contingency tables, in which the regression coe cients are interpr...

متن کامل

Graphical Log-Linear Models: Fundamental Concepts and Applications

We present a comprehensive study of graphical log-linear models for contingency tables. High dimensional contingency tables arise in many areas such as computational biology, collection of survey and census data and others. Analysis of contingency tables involving several factors or categorical variables is very hard. To determine interactions among various factors, graphical and decomposable l...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Biometrical journal. Biometrische Zeitschrift

دوره 52 2  شماره 

صفحات  -

تاریخ انتشار 2010